On Annotating Learner Corpora
نویسنده
چکیده
منابع مشابه
Building and Using Corpora of Non-Native Czech
Investigating language acquisition by non-native learners helps to understand important linguistic issues and develop teaching methods, better suited both to the specific target language and to the learner. These tasks can now be based on empirical evidence from learner corpora. A learner corpus consists of language produced by language learners, typically learners of a second or foreign langua...
متن کاملAnnotating Orthographic Target Hypotheses in a German L1 Learner Corpus
NLP applications for learners often rely on annotated learner corpora. Thereby, it is important that the annotations are both meaningful for the task, and consistent and reliable. We present a new longitudinal L1 learner corpus for German (handwritten texts collected in grade 2–4), which is transcribed and annotated with a target hypothesis that strictly only corrects orthographic errors, and i...
متن کاملOn the Automatic Analysis of Learner Language. Introduction to the Special Issue
Natural language processing (NLP) has long been used to automatically analyze language produced by language learners, typically aimed at providing individualized feedback and learner modeling in Intelligent Computer-Assisted Language Learning systems (cf. Heift & Schulze 2007). While much interesting research has been reported, it is difficult to determine the state of the art for the automatic...
متن کاملAnnotating an Arabic Learner Corpus for Error
This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. We adapted the French Interlanguage Database FRIDA tagset (Granger, 2003a) to the data. We chose FRIDA in order to follow a known standard and to see whether the changes needed to move from a Frenc...
متن کاملError Annotation of the Arabic Learner Corpus - A New Error Tagset
This paper introduces a new two-level error tagset, AALETA (Alfaifi Atwell Leeds Error Tagset for Arabic), to be used for annotating the Arabic Learner Corpora (ALC). The new tagset includes six broad classes, subdivided into 37 more specific error types or subcategories. It is easily understood by Arabic corpus error annotators. AALEETA is based on an existing error tagset for Arabic corpora, ...
متن کامل